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Introduction 

A  new  programming  technology  has  been  growing  up  around  the 
problem  of  how  to  transfer  human  expertise  in  given  domains  into  effective 
machine  form,  so  as  to  enable  computing  systems  to  perform  convincingly  as 
advisory  consultants.   Expert  systems  development,  confined  during  the  past 
decade  to  academic  laboratories,  is  becoming  cost-effective.   Reasons  are 
partly  advance  of  semi-conductor  technology  and  partly  development  of 
well-understood  methodologies  for  "knowledge-based"  programming. 

A  few  examples  may  be  in  order  to  illustrate  the  kinds  of  consultant 

23 
skills  under  discussion  (Figure  1) .   MOLGEN  interactively  aids  molecular 

geneticists  in  the  planning  of  DNA-manipulation  experiments.   VM  ("Ventilator 

15 
Management")  gives  real-time  advice  for  the  management  of  patients  undergoing 

mechanical  ventilation  in  the  intensive  care  unit  of  the  Pacific  Medical 

21 
Center.   PUFF  interprets  results  of  pulmonary  function  tests  in  use  in  the 

3 
same  center.   SACON  guides  engineers  in  the  use  of  a  large  program  which 

18 
integrates  structural  analysis  procedures.   PROSPECTOR  advises  when  and  where 

8 
to  drill  for  ore.   DENDRAL  takes  the  pattern  generated  by  subjecting  an 

unknown  organic  chemical  to  a  mass  spectrometer,  and  infers  the  molecular 

34 
structure.   SECS  uses  a  "knowledge-base"  of  chemical  transforms  to  propose 

schemes  for  synthesizing  state  compounds.   End-game  expert  systems  deploy 

and  discuss  chess-master  knowledge  and  generate  improved  teaching  texts. 

32  29 

MYCIN  and  INTERNIST  out-perform  clinical  consultants  within  the  limited 

domains  of  expertise  of  these  programs. 


Medicine     °  MYCIN,  for  identification 
of  bacteria  in  blood  and 
urine  samples,  and 
prescription  of 
antibiotic  regime. 


Shortliffe,  at  Stanford 
Medical  School,  USA. 


•  INTERNIST,  for  diagnosis 
in  internal  medicine. 

•  Intensive  care  ("iron-lung") 


Myers  and  Pople  at 
Pittsburgh  University,  USA, 

VM 

(Fagan  and  others) 


Interpretation  of  lung  tests 


PUFF 
(Kunz) 


Chemistry    •  DENDRAL ,  for  identification 
of  organic  compounds. 


Feigenbaum,  Lederberg, 
Djerassi,  Buchanan, 
Carhart  and  others. 
Stanford  University,  USA. 


•  SECS  system  for  designing 
organic  syntheses. 


Wipke,  University  of 
California  at  Santa 
Cruz,  USA. 


Molecular  genetics 


MOLGEN 

(Lederberg,  Martin, 

Friedland,  King,  Stefik) 


Other 


•  Consultancy  for  structural 
engineers 


SACON 
(Bennett) 


•  Consultancy  for  mineral 
prospecting 


PROSPECTOR 

(Hart,  Duda,  Einaudi) 


Figure  1.   Examples  of  expert  systems. 


Six  Facts  of  Today's  World 

1.  The  market  for  consultancy  demands  specialists,  not  generalists:   this 
applies  to  automated  consultancy  too. 

2.  Real-time  operation  is  in  some  applications  not  just  desirable  but 
essential  (see  the  reference  to  VM  earlier) . 

3.  A  consultant's  skill  consists  to  an  important  degree  of  asking  the  client 
the  right  follow-up  questions,  as  the  outlines  of  the  case  takes  shape. 

4.  Unless  the  program  can  do  this,  and  can  also  explain  its  steps  on  demand, 
client  confidence  collapses. 

5.  An  expert  system  acts  as  a  systematizing  repository  over  time  of  the 
knowledge  accumulated  by  many  specialists  of  diverse  experience.   Hence 
it  can  and  does  ultimately  attain  a  level  of  consultant  expertise 
exceeding  that  of  any  single  one  of  its  "tutors." 

6.  Program  text  in  the  ordinary  sense  is  an  unsuitable  and  unpopular  medium 
for  the  description  and  communication  by  human  experts  of  their  expertise. 
"Advice  languages"  are  needed. 

Nature  of  Knowledge-based  Expert  Systems 

Expert  systems  are  not,  and  owing  to  the  complexity  of  their  tasks 
cannot  be,  either  procedure-driven  in  the  ordinary  sense  _or  data-driven — although 
they  can  all  be  fairly  described  as  database-driven.   The  great  bulk  of  the 
database,  however,  is  typically  made  up  of  rules  which  are  invoked  by 
pattern-match  with  features  of  the  task-environment  and  which  can  be  added 
to,  modified  or  deleted  by  the  user.   A  database  of  this  special  type  is 
ordinarily  called  a  "knowledge-base,"  and  its  existence  determines  that  there 
are  three  different  user-modes  for  an  expert  system  in  contrast  to  the  single 
mode  (getting  answers  to  problems)  characteristic  of  the  more  familiar  type 
of  computing  (see  Figure  2): 
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(1)  getting  answers  to  problems — user  as  client; 

(2)  improving  or  increasing  the  system's  knowledge — user  as  tutor; 

(3)  harvesting  the  knowledge-base  for  human  use — user  as  pupil. 
Users  of  an  expert  system  in  mode  (2)  are  known  as  "domain 

specialists."   It  is  not  possible  to  build  an  expert  system  without  one. 

In  the  early  days  it  was  thought  that  computer-based  consultation 
would  be  confined  to  the  conventional  performance  goal  of  earlier  computing 
technologies,  namely  to  deliver  good  answers  to  the  client's  input  questions. 
This  turned  out  not  to  be  the  whole  picture. 
First:   the  client  demands  explanations  as  well  as  answers. 
Second:   the  system  is  typically  required  to  have  facilities  for  improving 
and  refining  its  knowledge  from  tutorial  interaction  with  the  client's  own 
domain  specialists. 

Third:   an  additional  feature  has  come  to  the  fore,  whereby  the  system  can  be 
made  to  generate  improved  codifications  of  domain-specific  knowledge  for 
human  use. 

A  typical  explanation  facility  allows  the  user  to  challenge  and 
examine  the  reasoning  process  underlying  the  system's  answers.   The  dialogue 
shown  in  Figure  3  is  excerpted  from  a  session  with  MYCIN,  a  program 
knowledgeable  about  diagnosis  and  antibiotic  therapy  in  certain  classes  of 
infectious  diseases.   This  record  contains  two  examples  of  the  system's 
ability  to  handle  questions  of  the  form  "how  did  you  work  that  out?"   In 
the  first,  the  user  types  "RULE"  and  receives  an  English-language  version 
of  the  last  rule  to  be  executed.   In  the  second  example  the  user-command  "WHY" 
triggers  a  backward  trace  of  the  inference  process  which  fired  the  system's 
last  question.   The  number  of  the  last  rule  is  also  given  in  case  the  user 
wishes  to  retrieve  and  examine  it.   Further  backward  tracing  could  be 


•••   22)   Was  any  chemotherapy  recently  begun  specifically  to  combat 
ORGANISM-1? 
••NO 

23)  Is  Jeremiah  Sample  (PATIENT-1)  a  burn  patient? 
"RULE 

RULE047 

IF:     1)   THE  SITE  OF  THE  CULTURE  IS  BLOOD,  AND 

2)  THE  IDENTITY  OF  THE  ORGANISM  IS  NOT  KNOWN 
WITH  CERTAINTY,  AND 

3)  THE  STAIN  OF  THE  ORGANISM  IS  GRAMNEG,  AND 

4)  THE  MORPHOLOGY  OF  THE  ORGANISM  IS  ROD,  AND 

5)  THE  PATIENT  HAS  BEEN  SERIOUSLY  BURNED 
THEN:   THERE  IS  WEAKLY  SUGGESTIVE  EVIDENCE  (.4)  THAT 

THE  IDENTITY  OF  THE  ORGANISM  IS  PSEUDOMONAS 

••NO 

24)  Is  the  patient's  illness  with  ORGANISM-1  a  hospital-acquired 
infection? 

••YES 

25)  Do  you  consider  Jeremiah  Sample  (PATIENT-1)  to  be  a 
compromised  host?  (alcoholic,  immunosuppressed ,  etc.): 
••NO 

26)  Has  Jeremiah  Sample  (PATIENT-1)  had  any  gu  surgery  or 
manipulation  of  the  urinary  tract?  (e.g.,  indwelling  catheter, 
cystoscopy,  etc.) 

••WHY 

I  AM  TRYING  TO  FIND  OUT  ABOUT  THE  IDENTITY  OF  THE 

ORGANISM. 

I  ALREADY  KNOW  THAT: 

THE  PORTAL  OF  ENTRY  OF  THE  ORGANISM  IS  UPPER-GU 

THE  MORPHOLOGY  OF  THE  ORGANISM  IS  ROD 

THE  STAIN  OF  THE  ORGANISM  IS  GRAMNEG 

THE  SITE  OF  THE  CULTURE  IS  BLOOD 

THEREFORE,  IF: 

1)  THE  PATIENT  HAS  NOT  HAD  A  GENITO-URINARY  MANIPULATIVE 
PROCEDURE,  AND 

2)  URINARY-TRACT-INFECTION  IS  NOT  A  PROBLEM  FOR  WHICH 
THE  PATIENT  HAS  BEEN  TREATED 

THEN: 

THERE  IS  SUGGESTIVE  EVIDENCE  (.6)  THAT  THE  IDENTITY  OF  THE 

ORGANISM  IS  E.COLI 

(RULE156) 


32 
Figure  3.   Excerpt  from  a  consultation  with  the  MYCIN  program. 


activated  by  repeated  use  of  "WHY"  and  "RULE"  commands  before  proceeding 
with  the  main  dialogue. 

The  supporting  software  framework  is  of  a  type  normally  called  a 

13 
"production  system" — a  modular  collection  of  rules,  together  with  a  control 

structure.   Each  rule  has  a  condition  part  consisting  of  a  conjunction  of 

patterns  CI,  C2,  etc.,  paired  with  an  action  part  (Al,  A2,  etc.)  according 

to  the  general  scheme  shown  in  Figure  4.   The  list  of  rules  is  searched 

for  the  subset  whose  condition  parts  are  satisfied  ("matched")  by  the  current 

state  of  the  database.   The  retrieved  candidate  set  is  processed  to  detect 

any  conflicts  and  to  resolve  them  by  elimination  of  rules  from  the  candidate 

set.   The  first  rule  of  the  reduced  set  is  executed.   An  action  part  can  be 

an  action,  e.g.  "print  disease-name"  or  a  logical,  numerical  or  other  value, 

or  it  can  be  an  action-sequence  or  an  action-scheme,  goal-list  or  other 

advice  structure  used  to  guide  an  action-generating  module.   Typically 

execution  of  the  action  part  of  a  rule  modifies  the  state  of  the  database. 

Rule-based  Inference 

The  deductive  inferences  performed  by  MYCIN  in  the  process  of 
answering  the  user's  questions  follow  a  control  scheme  known  as  "backward 
chaining."   To  get  an  idea  of  how  this  works,  consider  a  simple  set  of  rules 
(Figure  5)  in  which  letters  from  the  alphabet  have  been  substituted  for 
"facts." 

1)  A  &  B  ->  F 

2)  C  &  D  ->  G 

3)  E  ->  H 

4)  B  &  G  ->  J 

5)  F  &  H  ->  X 

6)  G  &  E  ->  X 

7)  J  &  K  ->  X 

The  arrow  "->"  implies  "THEN,"  thus  the  first  rule  reads 
If  A  is  true  AND  B  is  true  THEN  F  is  true. 


RECOGNIZE 


DATABASE:   C5   CI  C3 


PRODUCTION 
RULES 


(CI  &  C2) 

->  Al 

C3 

->   A2 

(CI  &  C3) 

-+  A3 

C4 

■>  A4 

C5 

■+  A5 

Match 


CONFLICT 
SET 

SELECTED 
RULE 

C3  ■*■  A2 

(CI  &  C3)  -*  A3 

C5  -*■   A5 

Conflict 
Resolution 

S 

>  C3  ■*  A2 

ACT 
Execution 


C3  -*•  A2 


> 


A2  executed 


At  the  given  instant  the  database  contains  the  system's  model, 
in  the  form  of  an  implicit  conjunction  of  conditions,  of  the 
state  of  the  task  environment.   "(CI  &  C2)  -»■  Al"  means  "if  the 
condition  (CI  &  C2)  matches  the  database,  then  execute  Al." 
Conflict  resolution  is  the  task  of  a  tie-breaking  algorithm, 
not  specified  here. 


Figure  4.   Production  system  "recognize-act"  cycle. 


In  the  simple  set  of  rules  below,  letters  of  the 
alphabet  have  been  substituted  for  "facts." 


1)  A  &  B  ->  F 

2)  C  &  D  ->  G 

3)  E  ->  H 

4)  B  &  G  ->  J 

5)  F  &  H  ->  X 

6)  G  &  E  ■*■  K 

7)  J  &  K  ->  X 


"p 

&  Q  -*  R" 

means 

"If  p 

"  Q 

is  true  and 
is  true  then 

R 

is  true" 

We  discover  that  B,  C,  D  and  E  are  true: 
is  X  therefore  true? 


Figure  5.   Production  system  used  as  a  deduction  engine; 
in  this  "backward  chaining"  mode  matching  is 
done  on  the  right-hand  rather  than  the 
left-hand  parts  of  rules. 


10 

Suppose  that  in  a  particular  case  we  discover  by  observation  that 
"facts"  B,  C,  D  and  E  are  "true,"  and  we  wish  to  discover  if  X  is  therefore 
true. 

The  program  will  consider  those  rules  which  could  be  used  to  infer 
the  truth  of  X,  i.e.  those  rules  (5,  7)  which  have  an  X  on  the  right-hand 
side  of  the  arrow.   Each  such  rule  is  tested  to  see  if  each  of  the  facts  on 
the  left-hand  side  are  known  to  be  true,  any  unknown  fact  being  treated  in 

the  same  way  as  the  original  fact  X — i.e.  we  proceed  by   recursion. 

Thus : 

X  may  be  deduced  from  Rule  5  or  Rule  7 

1)  Starting  with  Rule  5,  are  F  &  H  true? 

2)  F  can  be  shown  to  be  true  if  A  &  B  are  both  true  (Rule  1) 

3)  A  is  not  known  to  be  true,  so  this  attempt  fails 

4)  Continuing  with  Rule  7,  are  J  &  K  true? 

5)  J  can  be  shown  to  be  true  if  B  &  G  are  both  true  (Rule  4) 

6)  B  is  known  to  be  true  "a  priori" 

7)  G  can  be  shown  to  be  true  if  C  &  D  are  both  true  (Rule  2) 

8)  C  is  known  to  be  true  "a  priori" 

9)  D  is  known  to  be  true  "a  priori" 

10)  therefore  G  is  true 

11)  therefore  J  is  true 

12)  K  can  be  shown  to  be  true  if  G  &  E  are  both  true  (Rule  6) 

13)  G  is  already  known  to  be  true  (step  10) 

14)  E  is  known  to  be  true  "a  priori" 

15)  therefore  X  is  true 

16)  therefore  X  is  true. 


11 

The  above  simple  deductive  technique  is  the  basis  of  MYCIN'S 
reasoning.   The  technique  is  powerful  and  efficient  while  at  the  same  time 
very  general  and  easily  comprehended. 

"Learning"  Expert  Systems 

The  rule-based  structure  of  expert  systems  facilitates  acquisition 

by  the  system  of  new  rules  and  modification  of  existing  rules,  not  only  by 

tutorial  interaction  with  a  domain  specialist  but  also  by  autonomous 

"learning."  An  early  example  was  a  self-taught  pole-balancer  developed  by 

26  14 

Michie  and  Chambers  on  the  basis  of  225  condition-action  rules.   De  Dombal's 

diagnosis  program  acquires  its  medical  expertise  by  statistical  induction 

22,24 
over  patient  records  with  confirmed  diagnoses.   Michalski's  AQ11  program 

acquires  diagnostic  expertise  by  logical  induction  as  also  is  the  basis, 

following  different  formalisms,  of  the  successes  scored  for  machine  learning 

9 
in  chemistry  by  Meta-DENDRAL  and  in  robot  vision  by  the  Edinburgh  FREDDY 

2  30  19 

system.   Finally  Quinlan's  latest  version,  ID3,  of  Hunt's  CLS  algorithm 

recently  synthesized  inductively  in  a  few  seconds  of  machine  time  solutions 
to  classification  problems  which  had  proved  intractable  as  tasks  of  hand- 
synthesis  and  coding.   A  connection  thus  appears  between  machine  learning 

and  automatic  programming.   This  connection  gains  interest  from  the  fact  that 

36 
recent  runs  of  ID3  have  synthesized  programs  (in  the  form  of  decision  trees) 

which  perform  classification  tasks  more  than  five  times  faster  than  the  best 

hand-coded  program.   Various  "learning"  expert  systems  are  listed  in 

Figure  6. 

12 
The  system  for  soybean  diagnosis  shown  in  the  figure  starts  with 

primitive  descriptors  from  the  expert  pathologist  and  from  these,  and  from 
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a  training  set  of  values  for  diseased  plants  with  confirmed  diagnoses, 
synthesizes  a  set  of  diagnostic  rules.   The  unexpected  discovery  was  made 
that  a  machine-synthesized  set  of  rules  greatly  out-performed  those  developed 
by  the  plant  pathologist,  Dr.  Jacobsen,  who  acted  as  domain-specialist  and 
was  the  source  of  the  original  set  of  primitive  descriptors.   Jacobsen  then 
attempted  to  improve  his  rules,  and  partially  succeeded  as  shown  in  the 
bottom  line  of  Figure  7.   Feeling  that  further  improvement  would  be  hard,  he 
discontinued  the  attempt  and  adopted  instead  the  machine-synthesized  set  as 
the  basis  of  his  subsequent  professional  work. 

One  way  of  summarizing  the  relation  between  inductive  "concept 
learning"  and  automatic  program  synthesis  is  diagrammed  in  Figure  8.   An 
unexpected  side-light  on  future  uses  for  inductive  learning,  additional  to 
the  obvious  ones,  is  cast  by  the  following  consideration.   As  memory 
continues  to  get  cheaper  faster  than  processing  power,  the  possibility  of 
encoding  industrially  useful  information  in  the  form  of  giant  look-up  tables 
will  begin  to  be  realized  in  commercial  practice.   In  many  cases  the  time- 
complexity  of  the  function  to  be  represented  in  the  table  makes  it  infeasible 
to  initialize  such  tables  in  the  obvious  way.   When,  however,  the  function 
has  a  low-complexity  inverse  (as  for  example  the  prime  factor  function  or 
the  function  mapping  from  mass  spectra  to  molecular  structures)  it  is  possible 
to  initialize  such  tables  "backwards,"  i.e.  by  enumerating  f's  y-domain  and 
using  the  inverse  computation  to  enter  the  elements  of  the  x-domain.   Look-up 
then  proceeds  "forwards."  Drawbacks  are: 

(1)  cluttering  up  memory  with  uninteresting  and  unwanted 
x-values; 

(2)  conceptual  opacity  of  the  resulting  table  to  the  human 
domain-specialist.   Inductive  inference  techniques  will 
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CLIENT'S  MENTAL  PICTURE  OF  WHAT  HE  WANTS 


<1> 


(A)      Systems  analyst's  task 


INTENSIONAL  DEFINITION  OF  RELATION  TO  BE  COMPUTED 


(2) 


BJ      Programmer's   task 


,± 


<5> 


PRACTICAL  DEFINITION  OF  RELATION  (PROGRAM) 


(3) 


(c)   Learner's  task 


TUTORIAL  DEFINITION  OF  RELATION  (GUIDED  SELECTION 
OF  EXAMPLES  AND  COUNTER-EXAMPLES) 


(4) 


(5)   Teacher's  task 


EXTENSIONAL  DEFINITION  OF  RELATION  (ACTUAL  OR 
HYPOTHETICAL  DATABASE  OF  INSTANCES) 


(5) 


Figure  8.   A  way  of  looking  at  the  synthesis  of  a  concept  (description) 
in  the  form  of  a  program.   The  fundamental  ideas  underlying 
this  diagram  are  (1)  that  both  the  teacher  and  the  programmer 
have  the  task  of  conveying  concepts  to  target  devices  which 
are  then  called  upon  to  apply  the  acquired  concepts  to  new 
data;  and  (2)  that  a  symbolic  definition  is  not  the  only  kind 
of  definition  which  could  be  used  as  a  formal  specification 
from  which  to  build  a  program.   A  sufficient  set  of  tutorial 
instances  could  do  the  same  job.   In  the  case  of  the  teacher's 
task,  the  "target  devices"  are  of  course  his  or  her  human 
pupils — fortunately  equipped  with  rather  good  inductive 
capabilities.   Current  research  aims  to  equip  the  programmer's 
target  devices  in  something  like  the  same  way. 
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be  required  for  combatting  both  (1)  and  (2).   This  theme 
is  illustrated  in  Figure  9. 

Critical  Role  of  Patterns 

Knowledge-based  computing  systems  seek  to  implement  the  consulting 

skills  of  human  experts.   They  answer  questions  in  problem  domains  too 

complex  for  "standard"  hardware/software  designs,  but  not  so  complex  as  to 

be  totally  intractable.   Study  of  the  cognitive  strategies  of  experts  has 

shown  that  performance  in  such  domains,  at  least  for  human  practitioners,  is 

not  based  on  elaborate  calculations  but  on  the  mental  storage  and  use  of 

large  incremental  catalogues  of  pattern-based  rules.   Thus  chess  mastership 

is  gained  through  the  acquisition  and  organization  in  memory  of  diagnostic 

patterns,  not  through  increases  in  calculating  power.   In  Figure  10  the  upper 

two  patterns  illustrate  the  thematic  categories  of  the  sort  found  in  the 

35 
early  pages  of  a  chess  primer  ("fork"  and  "back-rank  mate"  respectively) . 

The  lower  two  exhibit  a  single  pattern  differing  by  a  minor  perturbation  which 

happens  to  be  critical.   In  the  left-hand  case  a  familiar  type  of  sacrificial 

attack  on  the  King  can  be  launched  with  impunity.   In  the  right-hand  position 

it  can  be  spiked  at  the  last  minute  by  the  move  B-Q6  by  Black,  guarding 

White's  intended  Q  x  RP.   The  role  of  remembered  patterns  is  thus  to  propose 

a  tactical  idea.   Detailed  check-out  by  concrete  analysis  is  still  required. 

In  Figure  11  some  representative  pattern-based  skills  are  listed, 

for  four  of  which  "expert  systems"  have  been  implemented.   The  approximate 

number  of  patterns  required  for  successful  machine  implementation  is  thus  in 

these  four  cases  known.   The  last  line  contains  estimates  for  a  highly 

sophisticated  domain  of  human  expertise  where  no  comparable  machine 

expertise  yet  exists.   Figure  12  shows  the  approximate  numbers  of  patterns 

required  for  a  few  of  the  fragments  of  the  total  chess  domain  for  which 
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Processing/dollar  grows  by  >10  per  decade 

Memory/dollar  grows  by  >10  per  decade 

Efficiency  of  given  problem-representation  depends  on 


RELATIVE  COST: 


processing 


memory 


So 


what 


are 


we 


planning 

* &  to  , 

—  do 


about 


it? 


ONE  WAY  -   choose  functions  with  easy  inverses 
and  build  very  large  databases 
(say,  trillion-bit).   Possible  e.g. 
for  mass  spec,  in  organic  chemistry. 

RESULT   -   requirement  generated  for 

knowledge-engineering  skills 

(1)  for  filtering  entries  to  the  table 

(2)  for  inductively  compacting. 


Figure  9.   A  new  situation  precipitated  by  continuing  hardware  trends 
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Figure  10.   Some  "patterns"  in  chess. 
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SKILL 

NATURE  OF 
IMPLEMENTATION 

No  of 
pattern- 
based 
rules  in 
implemented 
system. 

Seeing  a  scene 

Incremental  catalogue 
of  visual  patterns. 
Simple  scenes  of 
shadowed  polyhedra, 
Waltz,  early  1970s. 

10 

Balancing  a 
pole 

Incremental  catalogue 
of  pattern-based  rules. 
Michie  &  Chambers, 
mid-1906s. 

225 

Identifying 
organic 

compounds  from 
mass  spectra 

Incremental  catalogue 
of  pattern-based  rules. 
'Dendral'  program  of 
Lederberg,  Feigenbaum 
and  Buchana. 

c  400 

Identifying 
bacteria  from 
lab  tests  on 
blood  and  urine 

Incremental  catalogue 
of  pattern-based  rules. 
'Mycin'  program  of 
Buchanan  &  Shortliffe. 

c  400 

Calculating- 
prodigy 
arithmetic 

Alexander  Aitken, 

studied  by  Hunter,  1962,  2( 

used  pattern-based  rules. 

? 

Grandmaster 
chess 

Chess-masters,  studied 

by  Binet,  de  Groot, 

11           2f 
Chase  &  Simon,  Nievergelt, 

use  pattern-based  rules. 

30,000-50,000 

Figure  11.   Some  pattern-based  skills 


(condensed  from  Michie,  1976). 
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Ending 

Approximate  size 
of  the  problem  space 

Number  of  patterns 
required  for  an 
expert  system 

King  and  Rook 
against  King6 

40,000 

10 

King  and  Pawn 
against  King5 

100,000 

20 

King  and  Knight 
against  King 

*7 

and  Rook 

2,000,000 

30 

Figure  12.   Pattern-requirement  of  three 
small  sub-domains  of  chess 
grows  slowly  relative  to 
increase  of  domain  complexity, 
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machine  mastery  has  been  achieved.   These  small  sub-domains  could  of  course 
be  (and  have  been)  solved  by  brute-force  enumeration.   But  this  approach 
yields  representations  which  cannot  support  the  intelligent  query  and 
explanation  facilities  demanded  by  the  user  of  knowledge- based  systems. 

Patterns  in  Computer  Vision 

In  chess  and  other  deterministic  combinational  domains  (such  as 
industrial  routeing  and  scheduling  in  various  OR  contexts)  the  power  of 
patterns  is  revealed  in  the  extraction  of  sense  from  otherwise  intractable 
explosions  of  combinatorial  complexity.   Figure  12  gives  a  hint  of  how 
well-chosen  pattern-sets  can  serve  this  function,  and  shows  that  a  relatively 
slow  growth  of  the  pattern-catalogue  can  maintain  control  over  a  wildly 
growing  problem  space.   In  some  other  perceptual  domains,  notably  vision, 
combinatorial  complexity  is  compounded  by  the  presence  of  sensory  noise, 
thus  putting  an  even  higher  premium  on  the  stored  pattern-base.   Even 
without  low-level  noise,  perturbations  can  be  severe.   If  Figure  13  were 
viewed  upside-down  or  on  its  side  when  first  encountered,  there  is  little 

chance  that  the  human  eye  and  brain  would  "see"  the  Dalmatian  dog  drinking 

16 
from  a  puddle  in  a  stone-strewn  landscape.   The  feat  whereby  sense  is 

extracted  from  noise  rests  on  the  fact  first  emphasized  by  Helmholtz  in  the 

19th  century  that  visual  perception  is  an  act  of  reconstruction  of  the 

percept  from  a  large  repertoire  of  stored  internal  models.   The  rate  of 

input  of  visual  information  to  the  higher  centers  of  the  brain  is  not  great 

enough  to  do  more  than  give  hints  and  prompts  for  the  reconstructive  process. 

We  catch  the  mechanism  in  the  act  whenever  we  "see"  in  randomly  blotched 

surfaces  pictures  which  are  not  "really"  there — "similitudes  of  all  sorts 

of  landscapes  and  figures  in  all  sorts  of  actions"  as  Leonardo  da  Vinci 

remarked . 
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ure  ^.3.   A  "noisy"  visual  scene,  interpretable  by  the  human  eye  and 
brain  with  the  aid  of  a  large  stored  set  of  patterns 
(from  Gregory,  1970). 
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Removal  of  noise  is  executed  in  two  main  phases — a  pre-processing 
phase  in  which  knowledge  of  an  essentially  statistical  nature  is  applied  for 
smoothing  and  cleaning  up  the  raw  picture,  and  a  second  phase,  which  the 
figure  has  been  selected  to  illustrate,  in  which  semantic  knowledge  comes 
into  play.   For  handling  knowledge  of  the  second  kind  a  commonly  used 
representational  form  is  the  semantic  net,  of  which  Figure  14  shows  an  early 
example.   In  the  context  of  machine  expertise  in  vision,  Helmholtz1  "internal 
models"  popularized  by  Gregory  thus  receive  specific  and  concrete  realization. 

There  is  a  practical  bearing  of  computer  vision  on  expert  systems 
work,  owing  to  the  need  from  time  to  time  to  resort  to  diagrammatic 
explanations  and  other  "picture  talk"  in  the  course  of  man-machine 
consultations.   If  a  medical  program  is  advising  on  a  case  of  acute  abdominal 
pain  it  would  be  advantageous  to  be  able  to  input  the  standard  diagram  of 
the  abdomen  from  the  patient's  notes,  filled  in  graphically  to  indicate 
regions  of  tenderness,  rigidity,  etc.,  rather  than  to  have  to  construct 
symbolic  circumlocutions.   Beyond  a  certain  level  of  complexity,  e.g.  in 
computerized  fault-diagnosis  in  the  production  machinery  of  an  oil  platform, 
the  task  of  circumlocution  can  become  intractable. 

Past  work  towards  supplying  such  needs  has  until  recently  been 
retarded  by  lack  of  highly  parallel  hardware  designs.   Computer  vision  can 
certainly  use  these.   Working  with  a  FORTRAN  emulator  of  Duff's  CLIP-3 
parallel  array  processor,  Armstrong  and  Jelinek  (1977)  developed  a  command 
language  for  vision  in  which  they  were  able  to  specify  solutions  to  the 
normal  range  of  low-level  vision  tasks — removing  noise,  finding  and  measuring 
blobs,  following  lines,  detecting  vertices  and  so  on.   Although  emulator- 
overhead  slowed  their  algorithms  down  by  factors  ranging  from  a  thousandfold 
to  ten- thousandfold,  they  were  still  beating  standard  sequential  algorithms 
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["region,"  area,  compactness] 


'["holeset,"  number  of  holes] 


["outline,"  ] 


["hole,"  area,  compactness] 


["segment," 


...   ,  .    ,,  no.  of  segments,  no.  of  internal  corners,     , 
outline,        _        ..  _      .  ,  .  _  .    J 

no.  of  external  corners,  no.  of  straight  lines 


["segment,"  angle,  curvature,  length] 


Figure  14.   The  diagram  depicts  the  descriptive  structures  used  in  the  Edinburgh 
robot  project  to  encode  diagnostic  information  about  solid  objects 
viewed  by  the  computer-controlled  robot  through  a  TV  camera.   The 
slots  in  these  structures  could  only  be  filled  after  a  variety  of 
pre-processing  routines  had  acted  to  eliminate  noise  in  the  picture, 
identify  optically  homogeneous  regions,  to  find,  trace  and  segment 
boundaries,  and  to  perform  various  measurements  on  the  primitive 
features  thus  isolated. 
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in  real  time.   The  reason,  it  turned  out,  was  that  without  a  language  in 
which  to  express  parallelism  it  is  not  easy  to  acquire  the  mental  set  for 
seeing  simple  fast  ways  of  doing  these  things.   In  the  next  generation  of 
knowledge-based  systems,  incorporation  of  versatile  and  adaptive  array 
processors  for  vision  and  other  perceptual  tasks  will  be  a  necessity,  a  point 
of  confluence  with  the  closely  related  field  of  robotics. 

Mass  Production  of  Inscrutable  Patterns 

The  knowledge  engineer's  building  blocks  are  thus  patterns 
(descriptive  of  key  concepts  underlying  the  given  consultant  skill) .   A 
state-of-the-art  system  requires  many  hundreds  of  such  descriptive  patterns  to 
be  programmed,  and  the  current  cycle  of  development  envisages  thousands  or 
even  tens  of  thousands  for  complex  task  environments.   The  work  of  coding 
even  one  pattern  can  consume  many  programmer  weeks,  so  that  the  total  task 
appears  prohibitive. 

Accordingly,  the  knowledge  engineer  of  the  1980' s  will  not  construct 
his  own  building  blocks,  but  will  have  recourse  to  automated  systems  of 
pattern  synthesis.   Such  systems  already  exist.   They  must  be  equipped  with 
stocks  of  primitive  descriptors  appropriate  to  given  domains.   Pattern-synthesis 
is  then  induced  by  supplying  tutorial  specifications  in  the  form  of  examples 
and  counter-examples. 

Methods  have  recently  been  developed  which  can  inductively  synthesize 
patterns  from  examples  for  a  small  fraction  of  the  cost  of  programming  them 
by  hand.   When  run  on  the  machine  in  the  form  of  classification  programs, 
machine-made  patterns  typically  out-perform  man-made  ones  both  in  accuracy 

and  execution  cost.   But  these  machine-efficient  patterns  turn  out  to  be 
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conspicuously  different  from  those  developed  by  experts,  and  in  general  to 
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be  somewhat  inscrutable  to  humans.   A  methodology  is  therefore  needed  for 

humanizing  the  man-machine  channel. 

A  small  number  of  salient  phenomena  can  now  be  combined  to  yield 

some  rather  strange  conclusions  about  the  systems  towards  which  we  are 

moving. 

Six  Facts  of  Tomorrow's  World 

1.  Knowledge-based  systems  are  memory- intensive  rather  than  processor- 
intensive.   They  will  soon  comprise  thousands  of  stored  rules  per 
system. 

2.  Costs  of  memory  relative  to  processor  costs  will  continue  to  decrease. 

3.  The  way  to  use  a  large  memory  as  the  basis  of  knowledgeable  behavior  is 
to  fill  it  with  patterns  descriptive  of  the  key  concepts  of  the  given 
knowledge-domain.   From  these  the  rule-bases  are  built. 

4.  It  is  becoming  possible  to  mass-produce  such  patterns  by  machine  more 
cheaply  than  by  programming. 

5.  The  resulting  patterns  are  highly  efficient  at  run-time  but  their  form 
tends  to  inscrutability  for  the  domain  specialist. 

6.  A  preliminary  look  has  indicated  that  in  some  cases  there  may  be 
transformations  capable  of  rendering  machine-optimized  patterns  into 
more  humanly  transparent  forms. 

An  at chi Lecture  for  Knowledge  Engineering  in  the  1980s 

Putting  all  of  the  above  together  we  can  identify  major  components 
of  future  systems.   The  resulting  automated  mining-and-ref ining  plant  for 
human  knowledge  presents  some  bizarre,  even  awesome,  features.   To  maintain 
a  secure  grip  on  credulity  I  will  expound  it  within  the  narrow  framework  of  a 
selected  practical  application,  chosen  for  its  attractive  mix  of  combinatorial 
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complexity,  susceptibility  to  knowledge-based  approaches,  and  commercial 
potential.   I  am  speaking  of  the  identification  of  organic  compounds  by 
industrial  chemists. 

The  skilled  chemist  performs  the  knowledge-based  computation  shown 
in  Figure  15.   This  computation  is  extremely  hard  to  simulate  by  program.   A 
decade  of  work  at  Stanford  University  by  the  DENDRAL  project  has  resulted  in 
a  system  proficient  at  identifying  straight-chain  aliphatic  compounds  and 
members  of  certain  classes  of  oestrogenic  steroids.   Such  expertise  is  too 
narrow  to  be  of  serious  interest  to  chemists.   Some  other  approach  is  needed. 

As  a  starting  point  take  one  giant  lookup  memory,  shall  we  say 

12 
(conservatively)  10   bits,  directly  addressable.   We  wish  to  use  it  as 

a  dictionary  of  mass  spectrogram  patterns,  a  likely  molecular 

structure  being  entered  against  each  pattern.   Such  dictionaries  exist 

in  industrial  use  but  are  constructed  by  hand,  and  do  not  exceed  100,000 

entries.   If  the  computation  is  so  hard,  how  can  we  compute  the 

entries  for  the  dictionary  in  the  first  place? 

We  ask:   "What  about  the  inverse  computation?"  Programs  certainly 


do  exist  for  predicting 


molecular 
structure 


-y 


mass  spec, 
pattern 


in  reasonable  time. 


If  we  could  generate  exhaustively  and  irredundantly  the  complete  set  of 
molecular  structures  in  the  given  class  then  by  computing  for  each  structure 
its  proper  predicted  pattern  we  could  construct  (somewhat  back-handed ly) 

the  desired  dictionary.   A  suitable  structure-generating  program  exists  in  the 

37 
form  of  Raymond  Carhart's  "CONGEN." 

Such  an  immensely  powerful  question-answering  resource  would 

unfortunately  be  limited  to  answering  what  I  have  termed  elsewhere  "questions 

of  the  first  kind"  (What  is  the  value  of  f(x)?)  without  being  able  to  tell 
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Empirical 
formula 

Mass  spec, 
data 

NMR  data 

Infra-red 
data 

Other  data 


Knowledge  of  the 
stability  of 
chemical  bonds 
as  a  function  of 
their  local 
intra-molecular 
environments 


Inferred 

molecular 

structure 


Figure  15.   The  experienced  chemical  consultant  is  able  to 
compute  the  molecular  structure  of  an  unknown 
compound  by  applying  his  physico-chemical 
knowledge,  and  heuristic  rules  of  thumb,  to 
an  assemblage  of  measurements  performed  on  the 
unknown  compound . 
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the  chemist  anything  of  the  "why."  This  is  the  point  at  which  the  AI 

specialist  has  to  come  back  into  the  picture  to  deploy  his  inductive  inference 

machinery  (software  tools  such  as  Michalski's  INDUCE  and  Quinlan's  ID3)  to 

compress  parts  of  the  dictionary  into  pattern-rule  form.   Where  possible 

he  must  also  humanize  for  intelligibility.   We  end  up,  then,  with  a  scenario 

like  that  of  Figure  16.   The  weakest  link  in  the  diagram,  because  the  problem 

has  only  recently  been  identified,  let  alone  solved,  is  the  "humanization 

loop."   Initial  study  suggests  methods  applicable  in  some  cases — e.g.  for 

converting  large  homogeneous  decision  trees  into  hierarchically  structured 

collections  of  human- type  rules.   In  other  cases  conversion  of  representation 

may  be  thwarted  by  complexity  considerations.  Choice  may  then  have  to  be 

exercised  between  sacrificing  the  superior  efficiency  of  the  machine-made 

algorithm,  and  equipping  it  with  a  simplified  "cover  story"  for  human  use. 

The  possibility  of  enabling  knowledge-based  systems  to  handle  "stories"  in 

this  sense,  for  purposes  both  of  input  and  of  output,  deserves  study 
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and  overlaps  work  such  as  that  of  Schank  and  Charniak  on  story-understanding 

programs.   We  must  not  forget  that  before  the  introduction  of  writing,  and 

to  some  extent  even  to  the  present  day,  the  chief  means  of  encoding  useful 

knowledge  has  been  through  stories,  proverbs  and  other  mnemonic  paraphernalia  of 

folk  science.   Machines  too  may  have  to  be  taught  to  handle  these  time-honored 

summarizing  structures. 
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Structure  of  MYCIN.   Modularity  and  facilities  for  querying  the  knowledge-base. 
Rule-acquisition  and  computer  induction:   Michalski's  INDUCE  program.   Patterns 
as  building  blocks.   Amount  of  programming  per  pattern.   Total  number  needed 
for  given  task  domains.   Costs  of  man-made  and  machine-made  patterns. 
Reliability.   Intelligibility.   Need  to  "humanize."  Novel  future  designs. 
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